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Abstract 

Genetic Programming (GP) has found various applications. Under- 
standing this type of algorithm from a theoretical point of view is a chal- 
lenging task. The first results on the computational complexity of GP 
have been obtained for problems with isolated program semantics. With 
this paper, we push forward the computational complexity analysis of GP 
on a problem with dependent program semantics. We study the well- 
known sorting problem in this context and analyze rigorously how GP 
can deal with different measures of sortedness. 

ACM Category: F.2, Theory of Computation, Analysis of Algorithms and 
Problem Complexity. 

1 Introduction 

Genetic programming (GP) [5] has proven to be very successful in various fields 
such as symbolic regression, financial trading, medicine, biology and bioinfor- 
matics (see e.g. Poli et al. [9j). Various approaches such as schema theory, 
markov chain analysis, and approaches to measure problem difficulty have been 
used to understand GP from a theoretical point of view [TU] . 

Poli et al. [TUj state, "we expect to see computational complexity techniques 
being used to model simpler GP systems, perhaps GP systems based on mu- 
tation and stochastic hill-climbing." Computational complexity analysis has 
significantly increased the theoretical understanding of evolutionary algorithms 
for discrete search spaces. Here, one considers simplified versions of such algo- 
rithms and analyzes them rigorously on certain classes of problems by treating 
them as classical randomized algorithms [§]. Taking this point of view, it al- 
lows one to use a sophisticated pool of techniques and to treat the algorithms 
in a strict mathematical sense. Initial results on the computational complex- 
ity of evolutionary algorithms have been obtained for artificial pseudo-Boolean 
functions QTJ [5] . These results constitute the foundations for later results on 
classical combinatorial optimization, among them some of the most prominent 



problems in computer science such as minimum spanning trees, shortest paths, 
and maximum matchings (see Neumann and Witt for an overview). 

Recently, the first computational complexity results for GP have been ob- 
tained by Durrett et al [3J. In this paper, the authors consider simple GP 
algorithms on problems called ORDER and MAJORITY introduced by Gold- 
berg and O'Reilly [I]. These two problems model isolated problem semantics 
and the analysis constitutes a first step towards obtaining deeper computational 
complexity results for GP. 

Problems with isolated problem semantics are in a sense easy as they allow 
one to treat subproblems independently. The next step would be to consider 
problems that have dependent problem semantics and we follow this path in 
this paper. Our goal is to push forward the computational complexity analysis 
of GP by examining a problem with dependent problem semantics, namely the 
sorting problem. Sorting problem is one of the most basic problems in com- 
puter science. It is also the first combinatorial optimization problem for which 
computational complexity results have been obtained in the area of discrete 
evolutionary algorithms [TH [T] . In [T2] , sorting is treated as an optimization 
problem where the task is to minimize the unsortness of a given permutation 
of the input elements. To measure unsortness, different fitness functions have 
been introduced and studied with respect to the difficulty of being optimized 
by permutation-based evolutionary algorithms. 

We consider the simple GP algorithms set up in [5J and analyze them on the 
different fitness functions of the sorting problem proposed in [T2] . Our analyses 
point out how GP algorithms can deal with this problem that has dependent 
problem semantics and provide rigorous insights into the optimization process 
of our GP systems. As classical GP systems work on tree-based structures and 
allow many different solutions to a given problem, our investigations have to be 
significantly different from the ones carried out in |12j . One crucial difference is 
that elements may occur more than once in a tree. This leads for some of the 
fitness functions to local optima and prevents our GP algorithms from obtaining 
an optimal solution in expected polynomial time. 

The outline of the paper is as follows. In Section [2j we introduce the al- 
gorithms that are subject to our analysis and present our model of the sorting 
problem. Section [3] presents lower bounds on the expected optimization time, 
and Section [3] presents upper bounds for sortedness measures that lead to an 
efficient optimization process. Worst case situations and lower bounds are pre- 
sented for sortedness measures which may make the algorithms getting stuck in 
Section [5] Finally, we finish with some concluding remarks. 

2 Definitions 

2.1 Program Initialization 

When considering tree-based genetic programming, a set of primitives A has to 
be selected, where A contains a set F of functions and a set L of terminals. 
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The semantics of each primitive is explicitly defined. For example, a primitive 
might represent the value bound to an input variable, an arithmetic operation, 
or a branching statement such as an IF- THEN- ELSE conditional. Functions are 
parameterized, and terminals are either functions with no parameters, i.e. arity 
equal to zero, or input variables to the program that serve as actual parameters 
to the formal parameters of functions. 

For our investigations, we assume that a GP program is initialized in the 
following way: the root node is randomly drawn from A, and subsequently, 
the parameters of each function are recursively populated with random samples 
from A, until the leaves of the tree are all terminals. Thus, functions constitute 
the internal nodes of the parse tree, and terminals occupy the leaf nodes. 

2.2 HVL-mutate' 

The HVL-mutate' operator is an update of the HVL mutation operator ([S]) and 
is motivated by minimality. The original HLV first selects a node at random in a 
copy of the current parse tree. Let us term this the currentNode. It then, with 
cquiprobability, applies one of three sub-operations: insertion, substitution, or 
deletion. Insertion takes place above currentNode: a randomly drawn function 
from F becomes the parent of currentNode and its additional parameters are set 
by drawing randomly from L. Substitution changes currentNode to a randomly 
drawn function of F with the same arity. Deletion replaces currentNode with 
its largest child subtree, which often admits large deletion sub-operations. 

The variation of HLV that we consider here functions slightly differently, 
since we restrict it to operate on trees where all functions take two parameters. 
Rather than choosing a node followed by an operation, we first choose one of 
the three sub-operations to perform. Then, the operations proceed as shown in 
Figure [l] Insertion and substitution are exactly as in HVL; however, deletion 
only deletes a leaf and its parent to avoid the potentially macroscopic dele- 
tion change of HVL that is not in the spirit of bit-flip mutation. This change 
makes the algorithm more amenable to complexity analysis and specifies an 
operator that is only as general as our simplified problems require, contrasting 
with the generality of HVL, where all sub-operations handle primitives of any 
arity. Nevertheless, both operators respect the nature of GP's search among 
variable-length candidate solutions because each generates another candidate of 
potentially different size, structure, and composition. 

2.3 Algorithms 

We define the genetic programming variant called 
(1+1) GP*. It works with a population of size one and produces in each it- 
eration one single offspring. (1+1) GP* is defined in Algorithm [T] and accepts 
an offspring only if it is strictly fitter than its parent. 

Additionally, we consider a variant of (1+1) GP* which potentially applies 
HVL-mutate' more then once when a child is generated. Thus, for 
(1+1) GP*-single, we set the number of applications to k = 1, so that we 
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(d) After deletion (c) Before substitution (f) After substitution 

Figure 1: Example of the operators from HVL-mutate'. 



Algorithm 1 (1+1) GP* 

1. Choose an initial solution X. 

2. Set X' := X. 

3. Mutate X' by applying HVL-mutate' k times. For each application, ran- 
domly choose to either substitute, insert, or delete. 

• If substitute, replace a randomly chosen leaf of X' with a new leaf 
u £ L selected uniformly at random. 

• If insert, randomly choose a node v in X' and select u G L uniformly 
at random. Replace v with a join node whose children arc u and v, 
with the order of the children chosen randomly. 

• If delete, randomly choose a leaf node v of X', with parent p and 
sibling u. Replace p with u and delete p and v. 

4. If f(X') > f(X), set X := X' . 

5. Go to 2. 



perform one mutation at a time according to the HVL-mutate' framework, and 
for (1+1) GP*-multi, we choose k = l+Pois(l), so that the number of mutations 
at a time varies randomly according to the Poisson distribution. 

We will analyze these two algorithms in terms of the expected number of 
fitness evaluations that is needed to produce an optimal solution for the first 
time. This is called the expected optimization time of the algorithm. 
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2.4 The SORTING Problem 



Given a set of n elements from a totally ordered set, sorting is the problem of 
ordering these elements. We will identify the given elements by 1, . . . , n. 
The goal is to find a permutation Tr opt of 1, . . . , n such that 

TTppf(l) < 7To P t(2) < . . . < ir opt (n) 

holds, where < is the order on the totally ordered set. W. 1. o. g. we assume 
^opt = id, i- e. ir op t(i) = i for all i, throughout this paper. 

The set of all permutations n of 1, . . . , n forms a search space that has already 
been investigated in |12) for the analysis of permutation-based evolutionary 
algorithms. The authors of this paper, investigate sorting as an optimization 
problem whose goal is to maximize the sortedness of a given permutation. The 
following fitness functions measuring the sortedness of a given permutation n 
have been analyzed in [T2] . 

• INV(ir), measuring the number of pairs in correct order F] which is the 
number of pairs 1 < i < j < n, such that ir(i) < 7r(j), 

• HAM (it), measuring the number of elements at correct position, which is 
the number indices i such that = i, 

• RUN (it), measuring the number of maximal sorted blocks, which is the 
number of indices i such that n(i + 1) < tt(i) plus one, 

• LAS(tt), measuring the length of the longest ascending subsequence, which 
is the largest k such that 7r(£i) < . . . < n(i k ) for some i\ < . .. < i^, 

• EXC(tt), measuring the minimal number of pairwise exchanges in tt, in 
order to sort the sequence. 

Note that EXC(n) can be computed in linear time, based on the cycle 
structure of permutations. If the sequence is sorted, it has n cycles. Otherwise, 
it is always possible to increase the number of cycles by exchanging an element 
that is not sitting at its correct position with the element that is currently sitting 
there. For any given permutation 7r consisting of n — k cycles, EXC(tt) = k. 

We want to investigate sorting in the context of genetic programming. Note, 
that the fitness functions encounter several interactions between the elements of 
the permutation. Initial investigations on the computational complexity anal- 
ysis of genetic programming considered isolated problem semantics [3] and an 
important step is to investigate what happens if dependencies are involved. 
Therefore, the sorting problem modeled as an optimization problem seems to 
be ideal to get further rigorous insight into the optimization behavior of genetic 
programming. 

Considering tree-based genetic programming, we have to deal with the fact 
that certain elements are not present in a current tree. We extend our notation 

1 Originally, INV measures the numbers of pairs in wrong order. Our interpretation has 
the advantage that we need no special treatment of incompletely defined permutations. 
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of permutation to incompletely denned permutations. Therefore, we use it to 
denote a list of elements, where each element of the input set occurs at most 
once. This is a permutation of the elements that occur in the tree. Furthermore, 
we use tt(x) = p to get the position p that the element x has within n. In the case 
that x £ 7r, tt{x) = _L holds. We adjust the definition of 7r to later accommodate 
the use of trees as the underlying data structure. For example, 7r = (1, 2, 4, 6, 3) 
leads to tt(1) = 1, tt(2) = 2, tt(3) = 5, tt(4) = 3, tt(6) = 4, and tt(5) = _L 

In order to deal with incompletely defined permutations, we need to complete 
the measures that are to be minimized, namely RUN and EXC. We assign a 
fitness of n + 1 to incompletely defined permutations. 

The set of primitives used in our GP-variants is the union of the following 
two sets: 

• F := {J}, J has arity 2, 

• L := {1, . . . ,n}. 

Algorithm [2] describes how the fitness of a tree is computed. 

Algorithm 2 Derivation of f(X) for SORTING 

1. Derive a possibly incompletely defined permutation P of X: 

Init: I an empty leaf list, P an empty list representing a possibly incom- 
pletely defined permutation 

1.1 Parse X in order and insert each leaf at the rear of I as it is visited. 

1.2 Generate P by parsing I front to rear and adding ("expressing") a 
leaf to P only if it is not yet in P, i. e. it has not yet been expressed. 

2. Compute f{X) based on P and the chosen fitness function. 



For example, for a tree X with (after the in order parse) I = (2, 2, 3, 4, 5, 1, 6, 3) 
and \L\ = 6, P = (2,3,4,5,1,6), the sortedness results are INV(P) = 10, 
HAM(P) = 1, RUN(P) = 2, LAS(P) = 4, and EXC{P) = 4. 

3 General Lower Bounds 

We start with a simple lower bound, that is independent of the used sortedness 
measure. 

Theorem 1. Starting with a non-optimal solution, the expected optimization 
time of the single- and multi- operation cases 0/ (1+1) GP* on SORTING is 
Vt(n 2 ) if the deletion of nodes is not allowed, and f2(n) else, where n— \L\. 

In order to prove this, we have to bound the number of different mutations 
that can lead to an optimal tree. This is done in the following lemma. 
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Lemma 2. For any given non-optimal tree X and its in order parsed list of 
leaves I, there exist at most three different sub- operations of HVL -mutate' that 
can change X into an optimal tree. 

Proof. The proof is done by investigating the different cases of near-optimal 
individuals that can be improved to the optimal one in a single mutation. In 
the following, we denote by X—X a sequence of leaves labeled x. 

Case 1 An element i € L is missing in I. 

• If i = 1 and I — (2__2, . . . , n), then an insertion of 1 at position 
1 results in an optimal tree. Furthermore, a substitution of 2 to 1 
results in another optimal tree. 

• If 1 < i < n and / = (..., i-1 i+l—i+1, . . .), then an insertion of 

i between the z-l's and the j+1's results in an optimal tree. Alterna- 
tively, substitutions of the rightmost i-1 or of the leftmost i+1 yield 
further optimums. 

• If i = n and I = (. . . , n__n), then an insertion of n at the 
rightmost position, or a substitution of the rightmost n-1 yield opti- 
mal trees. 

Case 2 An element x € L is at an incorrect position p in /, thus possibly preventing 
other x in the rest of the list from becoming expressed. 

• If p = 1 and I = (x, 1_1, . . .), it is possible to delete i, or to substitute 
i by a leaf labeled 1, resulting in optimal trees. 

• If 1 < x <= n and I = (. . . , x, . . .), then it is possible to 
delete x, or to substitute x by i — 1 or by i. 

□ 

Note that the investigated individuals represent maximal cases w. r. t. the 
number of possible optimizing mutations. For example, for a tree X with I = 
(2, 3, . . .), an exchange of the 2 to a 1 would obviously not yield an optimal tree. 

Now, it is possible to prove Theorem [T] 

Proof of Theorem [7J We investigate the final step producing the optimal indi- 
vidual. There, it is necessary, that the last application of an HVL-mutate' sub- 
operation produces the optimal individual. For the single- mutation variant, the 
tree size at this stage is at least n — 1 = For the multi-mutation variant, 

the size is at least n — *Jn — fl(n) with high probability, as the probability to 
perform more than y/n operations is e~ n (^™^ . 

Based on Lemma[2] for each non-optimal individual, there are at most a total 
of three sub-operations to change it into the optimal one. For the sub-operation 
insertion, the probability of success, i. e. for inserting the needed terminal at the 
correct position, is bounded above by | n Qt n \ ■ Similarly, the success probability 
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for substitutional is bounded above by g n Q( n ) » ano - ^ or a deletion by | ^(n) . 
Hence, the probability of a success is bounded above by 



1 2 - o ^ 1 

nf2(n) nf2(ri) 



if no deletion of nodes is allowed, and by 

1 2 | _J_ _ Q / 1 

nO(n) 7if2(n) f2(n) \n 

else. Thus the waiting times for single sub-operations are bounded from below 
by tt(n 2 ) and fi(n). □ 



4 Upper Bound 

In this section we analyze the performance of our GP variants on one of the 
fitness functions introduced in Section [2j 

We exploit a similarity between our variants and evolutionary algorithms 
to obtain an upper bound. We use the method of fitness-based partitions, 
also called fitness-level method, to estimate the expected optimization time. 
This method has originally been introduced for the analysis of elitist evolution- 
ary algorithms (see, e.g., Wegener [T3]) where the fitness of the current search 
point can never decrease. The idea is to partition the search space into levels 
A±, . . . , A m that are ordered with respect to fitness values. Formally, we require 
that for all 1 < i < m — 1 all search points in Ai have a strictly lower fitness 
than all search points in A i+ i. In addition, A m must contain all global optima. 

Now, if Si is (a lower bound on) the probability of discovering a new search 
point in A; + i U • • • U A m , given that the current best solution is in Ai, the 
expected optimization time is bounded by V s i' as V s i i s ( an upper 

bound on) the expected time until fitness level i is left and each fitness level has 
to be left at most once. 

Although the used HVL-mutate' operator is complex, we can obtain a lower 
bound on the probability of making an improvement considering fitness im- 
provements that arise from the HVL-mutate' sub-operations insertion and sub- 
stitution. In combination with fitness levels defined individually for the used 
sortedness measures, this gives us the runtime bounds in this section. 

Let us denote by T max the maximal tree size at any stage during the evolution 
of the algorithm, and by Tk the tree size when the fitness k is achieved during 
a run. 

Theorem 3. The expected optimization time of the single- and multi- operation 
cases of (1+1) GP* with INV is 0(n 3 T max ). 

2 Note that in some cases, two different substitutions may result in the optimal solution. 
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Fitness 


(1+1) 


GP* 


function 


single 


multi 


INV 


0(n 3 T max ) 


0(n 3 T max ) 


HAM 


oo 




RUN 


oo 


« mi 


LAS 


oo 


n((?n 


EXC 


oo 


nam 



Table 1: Summary of results. Note that unless another lower bound is given, 
the lower bounds of fi(n 2 ) and Q(n) from Section ^ hold. 



Proof. The proof is an application of the above-described fitness-based parti- 
tions method. Based on the observation that n ■ (n — l)/2 + 1 different fitness 
values are possible, we define the fitness levels Aq, . . . , ^4 n -(n-i)/2 with 

Ai = {it\INV{it) =i}. 

As there are at most n ■ (n — l)/2 advancing steps between fitness levels to be 
made, the total runtime is bounded by the sum over all times needed to make 
such steps. 

We bound the times by investigating the case, when only a particular inser- 
tion of a specific leaf at its correct position achieves an increase of the fitness]^] 
The probability for such an improvement for (1+1) GP*-single is pk = ^ ■ 
For (1+1) GP*-multi, the probability for a single mutation operation occurring 
(including the mandatory one) is 1/e; thus pk — £1 ( ^gyj m the multi-operation 

case as well. 

Therefore, the total optimization time is 

n-(n-l)/2 

^ O (nT max ) = 0(n 3 T max ). 

k=0 

□ 



5 Worst Case Situations 

In the following, we examine our algorithms for the remaining measures of sort- 
edness. We present several worst case examples for HAM, RUN, LAS, and 
EXC that demonstrate that (1+1) GP*-single and (1+1) GP*-multi can get 

3 Examplarily, the tree with I = (n__n, 1,2, . . . , n-1) can only be improved (in a single step) 
by inserting a leaf labelled 1 at the leftmost position. 
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stuck during the optimization process. This shows that evolving our GP sys- 
tem is much harder than working with the permutation-based EA presented in 
where only the sortedness measure RUN leads to an exponential optimization 
time. 

We restrict ourselves to the case where we initialize with a tree of size linear 
in n and show that even this leads to difficulties for the mentioned sortedness 
measures. Note, that a linear size is necessary to represent a complete permu- 
tation of the given input elements. 

For RUN and LAS, we investigate the following initial solution called T wl 
and show that it is hard for our algorithms to achieve an improvement. 

n,n, ... ,n,l, 2,3, ... ,n 

S v ' 

n+l of these 

Theorem 4. Let T w \ be the initial solution to SORTING. Then the expected 
optimization time of (1+1) GP*-single and (1+1) GP*-multi is infinite respec- 
tively e°(") for the sortedness measures RUN and LAS. 

Proof. We consider (1+1) GP*-single first. It is clear that, with a single 
HVL-mutate' application, only one of the leftmost ns can be removed. For an 
improvement in the sortedness based on RUN or LAS, all leftmost n+l leaves 
have to be removed at once. This cannot be done by the (1+1) GP*-single, 
resulting in an infinite runtime. 

(1+1) GP*-multi can only improve the fitness is by removing the leftmost 
n + l leaves. Hence, in order to successfully improve the fitness, at least n + l 
sub-operations have to be performed, assuming that we, in each case, delete 
one of the leftmost ns. Because the number of sub-operations per mutation is 
distributed as 1 + Pois(l), the Poisson random variable has to take a value of 
at least n. This implies that the probability for such a step is e~ n (") and the 
expected waiting time for such a step is therefore e Q ^ which completes the 
proof. □ 

Similarly, we consider the tree T w i which has as leaves the elements 
n,n, ... ,n, 2,3, ... ,n — 1,1, n 

" v ' 

n+l of these 

and show that this is hard to improve when using the sortedness measures 
HAM and EXC. 

Theorem 5. Let T w2 be the initial solution to SORTING. Then the expected 
optimization time of (1+1) GP*-single and (1+1) GP*-multi is infinite respec- 
tively e°(") for the sortedness measures HAM and EXC. 

Proof. We use similar ideas as in the previous proof. Again, it is not possi- 
ble for (1+1) GP*-single to improve the fitness in a single step, as all n + 1 
leftmost leaves have to be removed in order for the rightmost n to become ex- 
pressed. Additionally, a leaf labeled 1 has to be inserted at the beginning, or 
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alternatively, one of the n + 1 leaves labeled n has to be replaced by a 1. This 
results in a minimum number of n + 1 sub-operations that have to be performed 
by a single HVL- mutate' application, leading to the lower bound of e n ^ for 
(1+1) GP*-multi. □ 

6 Conclusions 

Genetic programming is successfully applied in numerous fields. However, its 
computational complexity analysis has just been started recently. Thus far, only 
problems with independent problem semantics have been analyzed. We inves- 
tigated a first problem with dependent semantics, namely the sorting problem. 
Analyzing the set up of of Durrett et al [3] together with the fitness measures 
proposed by Scharnow et al. [121 . we have shown how the algorithms behave on 
different measures on sortedness. 

Our results are summarized in Table [T] For the measure INV we have pre- 
sented polynomial bounds on the expected optimization time. For the remaining 
measurements HAM, RUN, LAS, and EXC, we have pointed out situations 
where the algorithms get stuck. Our analyses give further rigorous insights 
into the behavior of simple GP systems. Furthermore, it shows the fact that if 
multiple occurrences of variables are allowed in the system, this may make the 
optimization task hard much harder than for permutation-based evolutionary 
algorithms, where only single occurrences are allowed. 
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